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Abstract 

One of the greatest data analysis challenges for the Laser Interferometer Space Antenna (LISA) 
is the need to account for a large number of gravitational wave signals from compact binary systems 
expected to be present in the data. We introduce the basis of a Bayesian method that we believe 
can address this challenge, and demonstrate its effectiveness on a simplified problem involving one 
hundred synthetic sinusoidal signals in noise. We use a reversible jump Markov chain Monte Carlo 
technique to infer simultaneously the number of signals present, the parameters of each identified 
signal, and the noise level. Our approach therefore tackles the detection and parameter estimation 
problems simultaneously, without the need to evaluate formal model selection criteria, such as the 
Akaike Information Criterion or explicit Bayes factors. The method does not require a stopping 
criterion to determine the number of signals, and produces results which compare very favorably 
with classical spectral techniques. 
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I. INTRODUCTION 



The Laser Interferometer Space Antenna (LISA) is designed to detect gravitational ra 



1]. The sensitivity of 



diation from astrophysical sources in the 10 -2 mHz to 100 mHz band 
LISA is such that very many such sources should be detectable. Paradoxically, this sensitiv- 
ity will also make signal identification somewhat problematic: parts of the band will likely be 
swamped with tens of thousands of signals. One of the most abundant classes of source will 
be close-by white dwarf binaries, producing signals from 0.1 mHz to 3 mHz. There will be 
source confusion below 1 mHz and resolvable sources above 5 mHz; the 1 mHz to 5 mHz band 
for LISA therefore presents a tremendous data analysis challenge, potentially containing up 
to 10 sources ODD . LISA's ability to detect and characterize other astrophysical sources 
will be greatly helped if the thousands of background signals from binary systems can be 
identified. For a detailed look at the population of binary systems that produce signals in 
LISA's operating band, and how they affect LISA's performance, we direct the reader to 

q n 

Barack and Cutler 2} and to Nelemans et al. 

We approach this problem from a new direction. Markov chain Monte Carlo (MCMC) 
techniques have been demonstrated to be especially suited to parameter estimation problems 
involving numerous parameters p . We have previously used the Metropolis-Hastings (MH) 
algorithm [?], Q in other gravitational radiation problems, such as estimating astrophysical 
parameters for gravitational wave signals from coalescing compact binary systems |9( or 
pulsars Q, d|. It is our belief that MCMC methods could provide an effective means 
for identifying and characterizing the thousands of background binary signals to be found 
within the LISA data. 

The method that we present in this paper is not a source subtraction approach 12], but 
one that identifies and characterizes binary produced periodic signals in the data. Signals 
that are sufficiently large in amplitude will have their parameters estimated. Sources that 
are weak will contribute to the noise; our method also produces an estimate of the overall 
level of the noise. We show that the noise level estimate from our method depends on the 
inherent detector noise level, and also the presence of unidentified signals. MCMC methods 
are robust and dynamic, and we believe that ultimately it will be possible to use them with 
LISA data to estimate simultaneously the parameters associated with a wide range of source 
types occurring in the presence of many thousands of white dwarf binaries. 
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In this paper we present the results of a simulation study, comprising a data stream of 
m sinusoidal signals embedded in gaussian noise. Although our simulation study is simple, 
it does highlight a number of issues relevant to the real LISA data analysis problem. For 
example, the number of signals present in the data, m, will be unknown. 

Our Bayesian approach does not need to fit each model with m signals, for m = 1, . . . M, 
and then select the best fitting model via the evaluation of Bayes factors. The evaluation of 
Bayes factors [lsI . Q] requires computation of the marginal likelihoods and thus marginaliza- 
tions over the parameter vectors of each model. This is a formidable computational problem 
when the dimension of the parameter space is large. A shortcut to the calculation of Bayes 
factors, the harmonic mean of the likelihood values 15], is known to be unstable because 
the inverse likelihood does not possess a finite variance. Other large sample approximations 
to the Bayes factors such as the Bayesian Information Criterion (BIC), also referred to as 
Schwarz Criterion, and the related penalized likelihood ratio model choice criterion, Akaike 
Information Criterion (AIC), have been shown to be inconsistent when the dimension of the 

n 

parameter space goes to infinity The newly developed deviance information criterion 
(DIC) is known to be controversial in mixture models. While MCMC algorithms like 
the Gibbs sampler and MH algorithm yield posterior distributions of the parameters, they 
do not provide marginal likelihoods. An indirect method of estimating marginal likelihoods 
from Gibbs sampling output has been developed and extended to output from the MH 
algorithm These, however, are impractical when the number of candidate models is 

very large, as is the case for LISA data. Therefore, we use another strategy, the reversible 
jump algorithm j^J, that samples over the model and parameter space in order to estimate 
posterior model probabilities/marginal likelihoods. We consider the number of sinusoids as 
an additional parameter and determine its marginal posterior distribution. Its modal value 
will give us the model with the most probable number of sinusoids. As illustrated in Section 

n 

II, a Bayesian analysis naturally encompasses Occam's Razor |2Q| and a preference for a 
simpler (smaller m) model. We thus address and solve the detection and estimation of sinu- 
soids simultaneously. In addition, our MCMC method is better than a classical periodogram 
at resolving signals that are very close in frequency, and we provide a detailed discussion of 
how to identify these signals. Finally, our method infers the noise level in the data together 
with the parameters of each of the m sinusoids. 

The problem of identifying an unknown number of sinusoids is neither new nor simple 
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22J . Whereas previous studies have looked for a handful of unknown signals, here we 

show results for 100 signals. Another benefit of using MCMC methods is that computation 

time scales roughly linearly as the number of parameters increases, and does not show an 

n 

exponential increase in time fa]. In the future we will make the character of the signals more 
realistic, taking into account the orbit of the LISA satellites and the nature of the inspiral 
of binaries. With our study of sinusoids we hope to inform LISA data analysis researchers 
of another possible avenue for characterizing a large number of background signals. 

The rest of the paper is organized as follows: Section |Il] illustrates that Bayesian infer- 
ence automatically implements the principle of Occam's Razor. Our Bayesian method and 
posterior computational algorithms are described in Section lllll In Section I1VI we present 
the results of this study using synthesized data. We believe that this method offers great 
hope for signal extraction in real LISA data, and this point is discussed in Section El 



II. OCCAM'S RAZOR 



The notion that Bayesian inference naturally implements a quantitative version of Oc- 
cam's Razor j^j is well known, but this property is sufficiently central to our approach to 
LISA analysis to warrant a very simple demonstration: 

We imagine two physical models, M.\ and M. 2 constrained by a single datum, d. Aii has 
one parameter, ai, to describe the datum. Ai 2 uses the sum of two parameters, s = a\ + a 2 
to describe the same datum. Clearly, all other things being equal, if the datum is equally 
consistent with both models, we would prefer Aii over Ai 2 on the grounds that Aii is 
'simpler'. To see how this naturally occurs within the Bayesian framework we proceed by 
considering the ratio of probabilities (the odds ratio) of the two models: 

pjM^d) _ p(M 1 ) P {d\M l ) 

p{M 2 \d) P (M 2 )p(d\M 2 )' 1 ' 

If we set the prior odds ratio to unity, reflecting no prior preference for either model, the 
right-hand side of the above equation is simply the ratio of marginal likelihoods (sometimes 
called the evidences) of the two models. This ratio is conventionally known as the Bayes 
factor for the comparison. We will take the priors for our parameters a x and a 2 to be each 
uniform in the range — > R. Under M.\ we therefore have p(a\\M.i) = 1/R. The likelihood 
under A4i is assumed to be p(d\a\, A^i) = 5(d — ai), where 5 denotes the Dirac function, 
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i.e. S(x) = 1 if x = and otherwise. This yields 



p(d\Mi 



R 



R 



5(d — ax) da\ 



1/R 0<d<R 
otherwise. 



(2) 



Under M2 with parameter s = cq + a2, the prior probability density function (pdf) for s is 
the convolution of the two uniform prior pdfs for cq and a2 and therefore has the 'triangle' 
form 

s/R 2 0<s<R 
2/R-s/R 2 R<s<2R, 



p{s\Mi 



(3) 



so the evidence for M. 2 is 



p2R 

p(d\M 2 )= p(s\M 2 )5(d - s) ds = { 
Jo 



d/R 2 0<d<R 
2/R-d/R 2 R<d<2R 
otherwise, 



(4) 



(see Fig. [TJ). If the datum lies in the range < d < R, the Bayes factor is (1/R) /(d/R 2 ) 



1/R 



p(d\Mi) 






^\^(d\M 2 ) 


y 


^ ^ d 



R 



FIG. 1: The evidences for models M\ (solid line, one parameter fit) and M2 (dashed line, two 
parameter fit) as a function of the datum d. Note that the evidence ratio always favors the simpler 
model A4\ when the datum is equally consistent with either. 

R/d, and Ai\ is always favoured over M.2- li d = R neither model is preferred, and if 
R < d < 2R then M.2 (as the only viable model) is a relative certainty. We now see why 
M\ is more probable than M2 when the datum is equally consistent with both (i.e., d < R): 
M2 has more flexibility than is necessary to explain the datum and so penalizes itself by 
spreading its evidence more thinly. We can also see that the effect is not simply dependent 
on the number of parameters. If M2 depended on just a single parameter but with a uniform 
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prior on this parameter over — > 2R, then it too would be discriminated against (in the 
case < d < R) for using more parameter space to do the same job as M.\. 

This quantitative ability to identify the least flexible model consistent with the data makes 
Bayesian inference the natural methodology in problems where the number of parameters 
is unknown. Indeed we do not need to consider explicitly Bayes factors at all if we consider 
the number of parameters itself to be a parameter, and determine its marginal posterior 
probability in the usual way This is the approach we will apply to the the LISA white 
dwarf binary problem, as described in the next section. 



III. EXTRACTION OF SIGNAL PARAMETERS 



We consider a signal consisting of m superimposed sinusoidal signals where m is an 
unknown parameter. Therefore we confine our attention to a set of models {A4 m '■ m £ 
{0, • • • , M}} with M being the maximum number of sinusoidal signals we allow. Let d = 
[d\, ■ • • , <ijv] be a vector of A" samples recorded at times t = [ti, • • - , ijv]- Model A4 m takes 
the observed data to comprise a signal, plus noise, = [e[ , • • • , e ™ ]'■ 

dj = s^fo, a m ) + 4 m) , for j = 1, . . . , AT (5) 

where the noise terms €j are assumed to be i.i.d. N(0, cr^J random variables. The signal of 
model A4 m has the form 

m 

s^(t v a m ) = [4 n) cos(2vr/f ) t J ) + sin(27r/^)] , (6) 
t=i 

so that each sinusoid component is characterised by one frequency and two amplitudes. 
Model Ai m is therefore characterized by a vector of 3m + 1 unknown parameters which we 
denote a m = [A^, B[ m \ / x , • ■ ■ , A^, Bm \ fm \ <J m \. The objective is to find the model 
M. m that best fits the data and to estimate its parameters. We use a Bayesian approach 
as in j^j, but instead of fitting each model separately and selecting the best using Bayes 
factors, we treat the number m of unknown sinusoids as an additional unknown parameter 
and determine the mode of its marginal posterior distribution (together with the posterior 
distribution of all the other parameters). The joint probability that these data d arise from 
the parameter vector a m and model M. m is given by 

p(d\m,a m ) oc ^exp { [dj - s im) (tj , a m )] 2 } . (7) 



For simplicity we take the prior distribution of the model dimension parameter m as uniform 
over {0, . . . , M}. The variances cx^ are given noninformative inverse gamma priors, discussed 
in Section Hi! CI 

Again, for simplicity our calculations use dimensionless frequencies with a Nyquist fre- 
quency of 0.5, and we use uniform priors for the component frequencies over [0,0.5]. 

Given m, and the frequency vector .p , our model (0) is a linear regression model 
which can be written in matrix form as 

where 6 {m) = [A^\ B{ m) , A { ™\ B { ™\ . . . , A^ 1 \ is the vector of 2m amplitudes and 

the N x 2m matrix D contains the entry cos(27r/j tj) in row % and column 2j — 1, and 
sin(27r/j m) t0 in row i and column 2j for i — 1, . . . , N and j = 1, . . . , m. Thus, an obvious 
choice for the prior distribution of the amplitudes would be a g-prior j^, a multivariate 
Normal distribution with mean zero and covariance matrix equal to g x (D'D)^ 1 . This 



g-prior was used in 22] for situations with m < 4. However, this choice becomes impractical 
for a large number m of signals since each iteration of the MCMC algorithm would require 
the calculation and inversion of the 2m x 2m covariance matrix of basis functions. Therefore, 
for our synthesized data we chose uniform priors for the amplitudes A^, with ranges 
A[ m) G [-A max , A max ] and B^ G [-B mH , 5 max ] and with A max = B max = 5. 

Note that these priors are not uninformative, and we express the parameter space in 
Cartesian coordinates simply because this is convenient for the implementation of the MCMC 
algorithms. A model expressed in polar coordinates with uniform priors on amplitude and 
phase would correspond to a different prior distribution for the transformed amplitudes in 
Cartesian coordinates. The effect of these different priors on the frequency estimates is 



generally negligible when the data is informative, and is discussed in 25]. The posterior 
pdf of the frequency reaches its maximum at the same value for both prior choices but with 
different curvatures at this maximum. 

By applying Bayes' theorem, we obtain the posterior pdf for our model parameters of 

p(m,a m )p(d\m,a m ) 
p(m, a m \d) = — , (9) 

where p(d) = J2iLo I p( m , a m)p(d\m, a m ) da m . The direct evaluation of the normalization 
constant p(d) is difficult due to the (3m + l)-dimensional integration involved. Moreover, 



the computation of marginal posterior pdfs would require subsequent 3m-dimensional in- 
tegration. To overcome this problem, we use sampling-based MCMC techniques to carry 



out posterior inference (see p for an introduction and overview.) These only require the 
unnormalized posterior p(m, a m \d) oc p(m, a m )p(d\m, a m ) to sample from Eq. © and to es- 
timate the quantities of interest. However, in the present context the overall model does not 
have fixed dimension and classical MH techniques j?l cannot be used to propose trans- 
dimensional moves. We therefore use the Reversible 
(RJMCMC) algorithm for our model determination 
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ump Markov Chain Monte Carlo 
28]. Additionally we use the de- 
layed rejection (DR) method for transitions within the same model. This allows 
better adaptation of the proposals in different parts of the state space by allowing the choice 
of the proposal distribution to depend on the proposed but rejected state as well as the 
current state. 



A. The RJMCMC for model determination 

To sample from the joint posterior p(m, a m \d) we construct a Markov chain simulation 
with state space U m=1 {m x iR 3m+1 ) where m is the current number of signals. When a 
new model is proposed we attempt a step between state spaces of different dimensionality. 
Suppose that at the nth iteration of the Markov chain we are in state (k, a^). If model M.y 
with parameter vector a' k , is proposed, a reversible move has to be considered in order to 
preserve the detailed balance equations of the Markov chain. Therefore the dimensions of 
the models have to be matched by involving a random vector r sampled from a proposal 
distribution with pdf q(r), say, for proposing the new parameters a' k , = t(a&,r) where t is 
a suitable deterministic transformation function of the current state and r. Here we focus 
on transitions that either decrease or increase models by one signal, i.e. k' 6 {k — 1, k + 1}. 
We use equal probabilities Pk^k' — Pk'^k to either move up or down in dimensionality, and 
without loss of generality we consider the upward move k' — k + 1. 

If the transformation t k ^k' from (a,k,r) to a' k , and its inverse t k ^ k , = tk'^k are both 
differentiable, then reversibility is guaranteed if we define the acceptance probability for 
increasing a model by one signal as 

p(a' k ,,k')p(d\a' kl) k')pk^k' \ ,j , ^ 
' p(ak,k)p(d\a k ,k)q(r)p k/ ^ k J k " k ' 



ak^k'{a' k '\ a k) = min 



where \Ji 



k^->k' 



dt(a k ,,r) 



(a k ,r) 



is the Jacobian determinant of this transformation 



ion 3- 



In this 



context, we suggest two types of transformations, 'split-and- merge' and 'birth-and-death' 



Split-and-merge transitions 



For a 'split' transition we randomly choose one of our signals with parameter subvec- 
tor ci(j) = (A\ , , ) from a k . This signal is chosen by sampling i uniformly from 
{!,..., k}. The proposed parameter vector a' k , comprises all the other (k — 1) subvec- 
tors of a k and two additional 3-dimensional subvectors, say a',^ = (A^\ B^\ ) and 
a'us = (A^ , b!£ \ f£ ) eacn with half the amplitude of a^, but same frequency, to re- 
place O(j). A three-dimensional Gaussian random vector (with mean zero), r = (r^, 7*5,7*/), 
changes the current state to the two resulting states o,', U y a', 2i -> through a linear trans- 
formation 



tfch-»fc'(a(i), r) 





4f + r A \ 
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Bf ) + r B 
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f } +r f 




Ak') 

Jil 
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4f } - r A 
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Bf ) - r B 
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t-rs ) 




{ tsr ) 



The inverse transformation t k ^ k , 
written as 



tk'^k accounts for the merger of two signals and can be 



tfc'^fe(a'(j 1 ), ct( 



( aW + aW \ 






s?p + 4 fc,) 






1 , 1 Ak 1 ) 
2->ii ' 2->i2 






K4*' 1 - A V) 




TA (k) 






r B (k) 






[ r/ (*) J 



Note that the determinant of the Jacobian of the transformation t kh ^ k i, (i- e - the determinant 



of the above 6-by-6 matrix) is | J) 



k^k' 



2, and that of its inverse is 1/2. Thus, the acceptance 



probability for increasing a model by one signal is given by 
a k „ k >(a' k ,\a k ) = min 



p(k,a {i) )p(d\a {i) ,k)q(r) 



(11) 
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By analogy, the acceptance probability for the reversible move of a fusion of two signals is 
given by 

/ i m ■ f n v(k,a {i) )p(d\a {i)l k)q(r) . 
BM( * kl = ^tV,^,,^^"^*" > ■ (12) 

9(a w ,r) 



where J^fc — , )jfr f »< , 

We use a multivariate normal proposal distribution, N[0, diag(cx^, a 2 Bl cr|)], for g(r). Suit- 
able values for the variances of this multivariate normal distribution can be chosen by con- 
sidering their effect on the acceptance probabilities. First, we consider the effect of a small 
proposal variance for a splitting transition. In this case the proposal has an insignificant 
effect on the likelihood since the two new signals in the model function are almost linearly 
dependent. On the other hand, the resulting large value for q(r) considerably decreases the 
acceptance probability when proposing a split, and increases it when a fusion of two signals 
is proposed. Now we consider the effect of a large proposal variance: the value of q(r), and 
therefore its influence on the acceptance probability, is moderate. However it causes the 
likelihood to change considerably, resulting in a small acceptance probability. The choice 
of cr\, a 2 B and a 2 is therefore an important consideration for improving mixing. In each 
iteration we set a\, a 2 B equal to the noise level cr^ of the current model m. 

The posterior precision of the frequency in a single-frequency model depends on the signal 
to noise ratio 7 = a/ {A 2 + B 2 )/a 2 and the number of samples N of the data set [31[ . Using 
a Gaussian approximation to the posterior pdf of the frequency [31], its standard deviation 
is given by cr'j = (27T7) _1 ^/48/A^ 3 . This yields a distance in frequency for which two 
sinusoids can still be identified as distinct, neglecting the interference of other sinusoids. We 
therefore use a'j as a frequency perturbation when splitting two sinusoids and use a normal 
distribution with this particular standard deviation when merging. 



Birth-and-death transitions 



A 'birth' transformational step simply creates a new signal with parameter triple a',* in- 
dependent of other existing signals in the current model M.k- The one-to-one transformation 
in this case is very simply given by t^yir) — r — a',*. The inverse ('death') transformation 
that annihilates signal i', t^ k , := tk'^ki nas form ty^k \ a 'u) ) = a 'u) = r - The Jacobian for 
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both of these is 1. The acceptance probability for the creation process is therefore 



p(k')p(a' {i) )p(d\a' k „k') 



mm 



and that for the annihilation process is 



a k >^ k (a k \a k 



p(k)p(d\a k , k)q{r) 



p(k)p(d\a k , k)q(a' (i 



(13) 



mm 




(14) 



p(k')p(a' {i) )p(d\a k ,,k') 

As for split and merge transitions, a bold proposal distribution q results in a small 
acceptance probability due to the strong effect on the likelihood, whereas timid proposals 
have minor effects on the likelihood but are often rejected due the higher values of the 
proposal distribution. An effective way to make a proposal for the frequencies is to base it 
on the Schuster periodogram of the data j^J , given by 



C(f) 



N 



[R(f) 2 + i(f) 2 } 



(15) 



where R{f) = Ylj=i dj cos (2irftj) and /(/) = Ylj=i dj srn finftj) are the real and imaginary 
parts from the sums of the discrete Fourier transformation of the data. This technique has 
already been applied successfully by We use proposals from a normal distribution for 
the amplitudes with zero mean and a standard deviation derived from the average amplitude 
of all remaining amplitudes of the current state of the Markov chain. 

Classical MCMC methods could be used for transitions within a particular model A4 m , 
however we use an adaptive MCMC technique here. The delayed rejection (DR) method 



has been introduced by |29|, |30|, 33] and several of us have successfully applied it to estimate 



cy d 



the frequency and frequency derivative of potential gravitational radiation signals produced 
by a triaxial neutron star 



B. The delayed rejection method for parameter estimation 

As sampling progresses, suppose that at the nth iteration the state of the Markov chain 
is a = a m from model Ai m . We can choose a new state within the same model by first 
sampling a candidate state a' from a proposal distribution qi(a'\a) and then accepting 
or rejecting it with an MH-probability aii(a'\a) depending on the distribution of interest. 
When a proposed MH move is rejected, a second candidate a" can be sampled with a different 
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proposal distribution q 2 (a"\a', a) that may depend on the previously rejected proposal. To 
preserve reversibility of the Markov chain and thus to comply with the detailed balance 
condition, the acceptance probabilities for both the first and the second stage are given by 

p{a')p{d\a')q 1 {a\a') 



ai{a | a) = mm 
and 



p(a)p(d\a)q 1 (a'\a) _ 



(16) 



p(a")p(d\a") qi (a> 


a")q 2 (a 


a',a")[l - ai(a' 


\a")} 


p(a)p(d\a)q 1 (a' 


a)q 2 (a" 


a, a')[l — ai(a'\ 


a)} 



a 2 {a \a,a) - mm 1, rr^n 7 ,\ \ i „\ am TT\ \i i ■ ( 17 ) 



We therefore apply distinct types of DR transition for the amplitudes and the frequency of 
a sinusoid, and these are considered below. The transitions are performed randomly and 
with equal probability for a randomly chosen sinusoid i. 



Proposing new amplitudes 



Our amplitude proposal distribution is multivariate Normal, with a covariance matrix 
that depends on the basis functions of the sinusoids. The closer two sinusoids are in frequency 
the more correlation there is in their recovered amplitudes. 

In (22! the amplitudes were regarded as nuisance parameters and integrated out by treat- 
ing them as parameters of a multiple regression model with a conditional posterior that is 
normally distributed with known mean and covariance matrix according to the g-prior. If 
deemed necessary the amplitudes can be updated by Gibbs sampling. However, the com- 
putation of the covariance matrix involves determining the inverse of D {m \f) 7 '£> (m) (/) for 
each iteration, which is impractical considering the number of sinusoids expected for LISA. 
Mindful of this we consider the interference between the sinusoids by proposing those pairs 
of sinusoids that have the strongest correlation, namely sinusoid pairs which are closest in 
frequency. Therefore for the randomly chosen sinusoid i we also involve the sinusoid i c which 
is the closest in frequency. The matrix for this subset of basis vectors reduces to 



( cos(2 7 r/f ) t 1 ) sin^Tr/f^) cos(2 7 r/f ) t 1 ) sin(27r/f ^0 N 



D 



(m) 



(/) 



cos(2vr/ 4 (m) t 2 ) sin(2vr/rt 2 ) cos^/^) sin(27r/^ 



? (m) 



(m) , 



y C os(2nf^t N ) sm(2n fr>t N ) cos(27rfC't N ) sin^C'V), J 



(m) , 



,(m) , 



(18) 
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which results in a covariance matrix merely of type 4x4 for computing proposals for 
a sinusoid pair (i,i c ). Now suppose that = (A\ m \ B^ m \ A^™\ j is a vector 

containing the amplitudes of the sinusoid pair (but not their frequencies). Then 



E (Mc) 



<o «i,(/)) T (^S)(/) T ^S)(/))" (19) 



is the covariance matrix of the proposal distribution q\{a', i i\\a>(i,i c )) = ^(°(i,ie)> £(i,i c ))> 
where r is a factor controlling the step size of the proposal. 

If the first proposal is rejected, another attempt is made by proposing the am- 
plitudes of a single sinusoid i without considering correlations. The variance of 
the proposal is assessed by the mean amplitude of all signals which is given by 
d^{a) = (2m)- 1 YZA( A i m) ) 2 + ( B i m) ) 2 \- This y ields the second Proposal q 2 (a'( i) \a) = 
N [d(i), r ■ diag (a^ m \a), a^ m \a))] where = ^A.\ m \ B^ J is the subvector of a con- 
taining the amplitudes of sinusoid i. 



Proposing a new frequency 

A new frequency is proposed as follows: In the first stage a new frequency for si- 
nusoid i is sampled from a proposal density that is proportional to the periodogram 
gi(a^) oc (0,0,C(/)) T and independent of the actual stage. This is similar to the sinu- 
soid proposal scheme for RJMCMC. The main objective of this stage is to coarsely scan the 
whole parameter space for frequencies. 

The event of a rejection suggests to sample from the local frequency mode and a pro- 
posal is made conditional on the actual state of the frequency by slightly perturbing the 
state. In the same manner as in the split-and-merge transition the perturbation is oriented 



on the possible achievable accuracy a = (2tt) 1 \j A%a 2 m (A ( f n ' > ) 2 + [B^ m ' v2 



N~ 3 of a 



frequency by j^jj. This yields the proposal 52 (a^l = N (a^, diag ^0, 0, a^ m )^j and 
aims to draw representative samples from the local mode in the second stage. 
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C. Updating the noise parameter 



The sum of the squared residuals between the model and the data, taken from the likeli- 
hood in Eq. is 

N 

S 2 = ^[d r a;.Ml 2 (20) 
i=i 

Using this, we choose a vague prior for the noise parameter a 2 m , defined by IG(a,f3) = 
IG(N p /2, N p ■ Sp/2) with a shape parameter a = N p /2 = 0.001 and a scale parameter 
P = NpSl/2 = 0.001. This yields N p = 0.002 and Sj = 1 for the parameters of the vague 
prior. The full conditional distribution 

m, « Tr ,f N p + N N p S 2 p + S 2 \ 
p{v m \m,a m ,d) oc IG I , ^ I (21) 

is used for drawing samples for <r^ in a Gibbs update. 
D. Initial values 

The initial values of a Markov chain are crucial for the length of the burn-in period needed 
to converge to the real posterior distribution. We could start with and empty model, A4 , but 
it is obvious that it would then take the sampler many steps to find all the signals. Instead, 
we perform a Fast Fourier Transformation (FFT) of the data and use this to generate our 
initial values. We use the frequencies corresponding to the local maxima in the periodogram, 
fo,i, as starting values for / m -° and A Q)i = 2R(f mBXi )/N, B Q)i = 2I(f max .)/N as starting values 
for v4™° and B™°, respectively. Theoretically we could use all the local maxima as initial 
values, but as most of them are due to noise we select only those that exceed a certain 
threshold. We set this threshold low, as it is easier to delete non-relevant sinusoids than 
create good ones. 

The frequency resolution depends on sample size N, suggesting that the convergence 
of the Markov chain is also dependent on N. As we will see, spectral estimates based on 
the FFT are significantly worse than those we obtain by our MCMC method, but they are 
sufficient to serve as initial values. 
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E. Identifying the sinusoids 



Although the RJMCMC method enables us to select the most probable model, we still 
encounter the label- switching problem. This is a general problem caused by the invariance 
of the likelihood under relabelling of the sinusoidal components, and has been extensively 
discussed in the context of mixture models 3^|. During the MCMC simulation, parameter 
triples are constantly changing their affiliation to individual sinusoids, either due to the 
creation and annihilation of sinusoids or following transitions within the same model. We 
therefore need an additional step in our analysis if we are to break this symmetry and 
talk meaningfully about individual sinusoid components. This step involves associating the 
samples in the final Markov chain with particular sinusoids, which we know neither by 
number nor by location. 

The sinusoids that are contained in the model A4, with the highest posterior probability 
of m, are permutations of rh coexistent sinusoids from a list of unknown size. We will 
assume therefore that this list is the same size as the upper limit m rm max of the marginal 
posterior of m. However, in practice the coexistent sinusoids in M. are usually clear, and 
it is rather unlikely that others present in higher order models could replace them. The 
parameter vector sampled in each iteration of the Markov chain (corresponding to model 
Ai) is a permutation of rh parameter triples determining rh out of m max sinusoids. The 
problem is to determine which parameter triple belongs to which sinusoid. We find that the 
parameter that contributes significantly to identifying a sinusoid is its frequency. Thus in 
order to obtain the dominant sinusoids that characterize model Xi we calculate the marginal 
posterior of the frequency and obtain the rh strongest peaks together with their frequency 
ranges by finding the threshold that separates those peaks. 

Since we have to deal with a vast number of output samples grouped within very small 
regions we cannot apply kernel density estimates or histograms, as the required fixed bin 
size for a histogram would be too small to be feasible. Instead, we use a variable bin 
size and calculate densities for a fixed number of samples per bin. We initially sort all 
individual parameter triples for all MCMC samples of the considered model M. by frequency. 
If we have generated n MCMC samples of model Ai during a run then we have to deal 
with van parameter triples and hence fi < ... < f n rh ordered frequency samples. The 
density can be assessed by calculating the frequency range spanned by a fixed number of 
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sorted frequencies. The rhn parameter triples are from m sinusoids and therefore we can 
expect n frequency samples per peak since we assume the number of peaks to be similar 
to the number of sinusoids. Hence the fixed number of sorted frequencies that spans the 
frequency ranges must be some fraction, r, of the number of MCMC samples n, and is 
the counterpart to the required bin width in a histogram or the bandwidth h of a kernel 
density estimate f(/) = (2/mm) _1 Y^7=i hf-fi\<h with uniform kernel, where I is the indicator 
function. The advantage of this approach is the automatic adaptation of the bandwidth to 
the situation by involving the information of the expected parameter triples per peak. We 
found r = 0.05 (5%) to be a good value, and hence rn serves as an estimate of the number 
of samples needed for assessing the spans of the frequency ranges. In analogy to the kernel 
density estimate with uniform kernel, where the number of samples are counted that fall 
into a range of length 2h, the density value pj corresponds to each frequency sample j 
and its rn — 1 subsequent samples that fall into a frequency range of length (/ J+rn _i — fj). 
Hence the density is given by pj = rn/ [nrh (fj +rn _i — fj)} = rj [m(/j +rn _i — fj)} where 
j = 1, • • ■ , n(m — r). Since each pj comprises the samples j, ■ ■ ■ ,j + rn — l this has to be 
considered later when deriving spans for the peaks. We find the smallest density threshold 
/ that separates m distinct peaks with respect to the values of pj. The frequency range 
for each peak k G {1, • • • , m} is [fj ktBtait , fj ktead +m-i] where j k , s taxt and j fciend are the indices 
of the first and the last member of the set of p^-'s in peak k, respectively. Due to the 
fact that we always focus on frequency ranges that contain a fixed amount of frequency 
samples we efficiently deal with large frequency ranges of low density. This technique is 
fast and requires a minimum of memory. The greatest computational cost is in sorting the 
frequencies, although this can be carried out fairly quickly using a heap sort. 

It is still possible that individual peaks contain more than one sinusoid or even none. This 
can be assessed by the histogram of the number of sinusoids in the MCMC samples for the 
restricted frequency range under consideration. To separate more than one present sinusoid, 
we then consider the two amplitudes and apply an agglomerative hierarchical cluster analysis 
that involves all three parameters. We use a modified Ward technique [35] that minimizes 
the within-cluster variance using a normalized Euclidean distance between the parameters 
by adjusting the frequency range to the much larger range of the amplitudes. We use the 
software package R j^l for this task. The Ward technique starts with each parameter 
triple belonging to a singleton cluster. Iteratively, cluster pairs are joined that produce 
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the smallest possible increase in within-cluster sum of squares. In this particular case we 
have no difficulty in detecting when to stop the agglomeration, which is an important issue 
in cluster analysis, as we know the expected number of sinusoids in a peak and hence the 
number of clusters. However, due to the vast number of parameter triples involved here it 
is not possible to carry out a cluster analysis simultaneously on all of the data. Instead we 
have to divide the set of samples into randomly chosen subsets of equal size and perform 
separate cluster analyses (with R) for each of these subsets. Finally, we perform a cluster 
analysis on the median points of the subset clusters to allocate each of those clusters to a 
super-cluster. Each single parameter triple is then allocated to a super-cluster and hence to 
a presumed sinusoid. Those parameters which fall into small peaks below the threshold can 
be allocated to an additional noise contribution. 

IV. SIMULATIONS 

We created an artificial data set of 1 000 samples containing 100 random sinusoids in 
gaussian noise. The amplitude coefficients were chosen randomly in the range [— 1 . . . 1] 
and the noise standard deviation was a — 1, making our maximum signal to noise ratio, 
7i = \J (A? + Bf)/a 2 , about 1.4. We will present our results with dimensionless units. We 
chose a uniform prior for m on {0, 1, 2, . . . , M = 60 000}, and set A mSuX = B mSuX = 5. The 
Markov chain ran for 3 x 10 8 iterations, the first 5 x 10 6 of which were considered as burn-in 
and discarded. The chain was then thinned by storing every 2 000th iteration. The MCMC 
simulation was implemented in C on a 2.8 GHz Intel P4 PC and took about 78 hours to run. 
Fig. |2f a) gives the histogram of the marginal posterior model probabilities obtained by the 
reversible jump algorithm. As each model Ai m is characterized by a different noise level a m , 
we have also plotted the marginal posterior distributions of the noise standard deviations 
in Fig. Efb). Note that a m decreases with higher model order m since a model comprising 
more sinusoids accounts for more of the available power. 

All subsequent results presented here are based solely on MCMC samples corresponding 
to model A4g5, so we will omit the superscripts and denote the parameter vector of model 
by (At, Bi, fi, . . . , A95, B 95 , / 95 , Og 5 ). The power spectral density of the signal can be 
estimated from the product of the conditional expectation of the energy of each sinusoid 
i given its frequency fi, E(Af + Bf\d,m, fi), and the posterior pdf of fi given the data, 
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FIG. 2: (a) shows the model probabilities deduced from our analysis. The model corresponding 
to 95 sinusoids has highest probability. Each model has a different noise level, and (b) shows the 
estimated noise standard deviation and their confidence intervals for the different models. The 
dotted line indicates the real noise level. As model 100 has a very low probability only two MCMC 
were obtained during the run of 3 x 10 8 iterations, making it poorly determined. 

p(fi\d) as described in j^jj. An advantage of using a Bayesian spectral analysis is the 
subsequent ability to calculate confidence areas for the spectrum by grouping our MCMC 
samples and calculating posterior confidence intervals for frequency bins. An appropriate 
width for the bins can be estimated using the frequency resolution of the spectrum, 07 = 
(27r7) _1 (48/A^ 3 ) 1,/2 , given by [3]. In this example the choice of 20 000 bins is sufficient to 
resolve sinusoids of 7 < 1.4. 
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Fig. |3] compares the known real sinusoids from which the data set had been created, the 
Bayesian spectral density estimate as described above, and the classical periodogram derived 
from Eq. We note here that Fig. Efb) and Fig. E^c) are spectral densities, whereas 
Fig. El^a) is a plot of the component energies, defined as N(A? + Bf)/2. The theoretical 
spectral density of these components would consist of delta functions of formally zero width 
and hence infinite density. Furthermore the accuracy of the Bayesian spectral estimate is 
better than the one obtained from the periodogram, which also also leads to different values 
on the ordinate. The Bayesian spectrum estimate usually reveals much higher values in 
the density since the same energy contribution from a sinusoid is distributed over a smaller 
frequency range. However the locations of the peaks can be compared directly, which is the 
main purpose of Fig. El 

We have picked out three different frequency ranges for special attention. Each contain 
two close sinusoid pairs and we will refer to these as Region-1:/ G [0.0486,0.0506], Region- 
2:f G [0.1375,0.1395], Region-3:/ G [0.3594,0.3614]. Joint posteriors for these regions are 
presented in Fig. |U 

In Region-1 we see two sinusoids with a frequency gap of approximately 1/N = 0.001 
which, for a discrete Fourier transform, is the size of a single frequency bin. Region-2 
and Region-3 contain sinusoids with smaller frequency gaps of 0.000 413 and 0.000 517, 
respectively. The posteriors for Region-1 are strongly peaked at the true values. The same is 
true for Region-2 and Region-3, but we observe a larger uncertainty in their frequencies and 
even more in their amplitudes especially when the frequencies of the sinusoids are very close. 
The reason for this is clear: when the frequencies of the sinusoids are very close together they 
slowly beat against each other, and for these particular signals the beat period is greater 
than the observing time so that their total energy is difficult to assess. 

The spectral estimates for these regions are shown in greater detail in Fig. Rather 
than scale the frequency posterior by the expectation of the energy, as in Fig. EJ here we 
scale by the 2.5%, 50% and 97.5% quantiles of the energy for each frequency bin to indicate 
the uncertainty in the value of the spectral density. 

The gap between the two sinusoids in Region-1 of Fig. El is about 1/N, and the sinusoid 
with the lower frequency has a significantly larger energy than its neighbor, so its frequency 
is inferred with greater accuracy. Region-2 and Region-3 however comprise pairs of consider- 
ably closer sinusoids, making it considerably more difficult to infer their separate frequencies. 
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FIG. 3: Comparison of (a) the real signal energies, (b) the Bayesian spectral density, and (c) the 
classical periodogram of the test data. Note that (a) shows component energies, whereas (b) and (c) 
show energy densities (inferred energy per unit frequency). The peaks generated by the Bayesian 
method are higher than those of the periodogram, reflecting its greater ability to constrain the 
frequency. 

The corresponding uncertainties in their amplitudes can be seen in both Fig. 0] and Fig. 
which shows large energy density values, especially in the area where the frequencies overlap. 
Although spectral estimation is an important topic, for LISA analysis a more pertinent 
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FIG. 4: Joint posteriors for the amplitudes and frequency in Regions- 1, -2 and -3. The circles show 
the true values of the parameters comprising signal pairs. 

issue is the identification of orbital parameters for the compact binary systems generating 
the signals. In our example, we can therefore consider the joint posterior of the frequency 
fi and the energy E = N(A 2 + Bf)/2 for the six sinusoids in our three regions (Fig. |UJ). 
Unlike in Fig. the ordinate now displays an energy rather than an energy density, and the 
contour is at the 95% credible level. 

To better point out the strength of this Bayesian approach, in Fig. we display 95% 
confidence areas for a wider frequency range, including more sinusoids. As stated in J^j], "if 
the signal one is analyzing is a simple harmonic frequency plus noise, then the maximum 
of the periodogram will be the best estimate of the frequency that we can make in the 
absence of additional prior information about it" . This is demonstrated in Fig. [7| showing 
the concordance in frequency estimates for the well-separated sinusoids from both the joint 
posterior and periodogram plots. However, the periodogram peaks are significantly wider 
than the 95% areas, and are clearly sub-optimal for closely spaced sinusoids. In fact the 
posterior probability density for frequency is the exponent of the ratio of the periodogram 
C(f) and the noise variance a 2 25j] . The Bayesian approach therefore takes account of 
the noise variance in the estimation process, which explains why the confidence regions are 
significantly narrower than the periodogram peaks. 

This Bayesian method therefore shows great power when tackling strong signals closely 
spaced in parameter space (in this case, frequency). In addition it delivers confidence inter- 
vals for the parameters (frequency and amplitude) and can take account of relevant prior 
information when applied to LISA data. 
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FIG. 5: Bayesian spectral estimates, showing 95% confidence bounds in spectral density, for the 
three regions considered. Each regions contain two distinct sinusoids, one shown with solid lines 
and the other with dashed lines. The three lines for each sinusoid show the median value (thick) 
and 95% bounds (thin) of the spectral estimate at each frequency. The vertical lines show the true 
frequency values of these components. The dotted line traversing the entire width of the plots is 
the corresponding classical periodogram. 
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FIG. 6: 95% posterior confidence contours of energy and frequency for the six sinusoids considered. 
The empty circles show the median value of the posterior peak, and the filled circles show the true 
value. 

We have seen that Fig. EJEl and El reveal strong interference between very closely spaced 
signal pairs resulting in poor estimation of their parameters. Nevertheless, the Bayesian 
approach succeeds in revealing even these as separate sinusoids, at a level far beyond the 
ability of a classical periodogram. To investigate this further, we ran a series of simulations 
with two sinusoids gradually approaching each other in frequency. The results of these 
simulations are presented in Fig. |S] which shows how the ability of the method to separate 
the signals depends on their strengths, their relative phase and on observing time. In our 
examples we use t G {0, . . . , N — 1}. We ran 10 series of simulations for five different phase 
shifts (0, 7r/4, 7r/2, 37r/4, tt) between the two equal-amplitude sinusoids, and for two different 
signal strengths, with a — 1. During each set of simulations the frequency gap between 
the two sinusoids was increased by 1% of a (l/iV)-step. (Recall that N = 1000 so that 
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FIG. 7: Top: A section of the (energy, frequency) joint posterior probability density containing 
eleven 95% posterior confidence areas. The bottom plot shows the classical periodogram for the 
same frequency range. Twelve sinusoids are actually present in this region, indicated by gray dots 
and one sinusoid, at / = 0.338 720 9, was too weak to be identified by our method (7 = 0.11). 

1/N = 0.001). The left-hand column of Fig. |H] displays the results of runs in which the 
sinusoids have A\ + B\ = A\ + B\ = 2. The right-hand column displays results for a quarter 
of this energy, i.e., A\ + B\ = A\ + B\ = 0.5. A high probability for model 2 indicates 
that the two sinusoids could be separated well. If model 1 has the highest probability then 
the two sinusoids could only be identified as one signal, whereas model indicates that the 
two sinusoids effectively annihilated each other with respect to the observation time and the 
resulting signal is too weak to be identified against the background noise. This last effect 
can only be observed for phase shifts close to n since the amplitudes then have opposite 
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FIG. 8: Model probabilities for different pairs of equal-amplitude sinusoids in gaussian noise (a = 
1). Each plot displays two sinusoids with varying frequency gaps expressed as a percentage of 
one (l/iV)-step (with iV = 1000). Each row corresponds to a different phase shift between the 
sinusoids: from top to bottom the phase shift is 0, ir/4, tt/2, 37r/4, it. The two columns display 
results with different signal strengths, as given in the title of the plots. 



signs and for a frequency gap of around zero almost no resultant amplitude is developed 
within the observation time considered. 

It is obvious that stronger sinusoids are easier to detect and separate, so model 2 is favored 
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with much smaller frequency gaps in the left-hand column of Fig. |H] than in the right-hand 
column. Furthermore, in both the left and right-hand columns the separation ability is 
worse for small phase shifts and generally improves as the phase shift increases, but this 
again depends on the noise level. The most striking effect can be seen in the right column 
at a phase shift of 3ir/4. For a small frequency gap the sinusoids can be detected but not 
separated, then from 14% of a (l/iV)-step model starts to be favored. One might expect an 
increase in the probability of of model 2 at this stage, but in fact their frequency and phase 
separation mean the pair almost annihilate each other over the time period considered, so 
that the resultant cannot be seen against the noisy background. With further separation 
the effect is reduced, and for larger frequency gaps (30% of a (l/iV)-step) model 2 does 
indeed become dominant. This phenomenon can also be seen in the periodogram revealing 
a single peak slowly decreasing in height from 1% to about 19% of a (l/iV)-step. Then the 
signal has the same height as the surrounding noise and is slowly increasing beyond 20% of 
a (l/iV)-step. Note that for a different observing period, say t G {t s tart, • • • , 4tart + N — 1}, 
the results would have been different. However even in the worst case the detection can 
be achieved within 40% of a (l/iV)-step for the weaker sinusoids and within 26% for the 
stronger sinusoids. 



V. DISCUSSION 



In this paper we have presented a Bayesian approach to identifying a large number of 
unknown periodic signals in a set of noisy data. Our reversible jump Markov chain Monte 
Carlo method can be used to estimate the number of signals present in the data, their 
parameters, and the noise level. This method compares favorably with classical spectral 
techniques. Our approach allows for simultaneous detection and parameter estimation, and 
does not require a stopping criterion for determining the number of signals. 

Although the parameters of even strong components are not well determined when they 
are sufficiently close together in frequency, we still obtain useful confidence intervals. Im- 
portantly, the noise level is itself a parameter in the overall fit so that the energy present 
in the data is automatically allocated to either signal or noise. The large uncertainties in 
amplitudes we obtained for the closely-spaced components in our example of 100 sinusoids 
will, for a practical problem, be lessened by a sensible choice of priors. A prior similar to the 
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g-prior, using a diagonal matrix and a hyper-parameter g as a multiplicative factor, might 
be a good choice. For the real LISA data we would use priors for the signal amplitudes that 
would reflect our astrophysical knowledge. 

The motivation for our research is to address the difficulty that LISA will ultimately 
encounter in what is loosely called the confusion problem. LISA may see as many as 100 000 
signals from binary systems in the lmHz to 5mHz band. We therefore view our work as 
a powerful new technique for identifying and characterizing these signals in the LISA data 
stream. 

The work presented here is of course a highly simplified toy problem: we are neglecting 
signal modulation due to LISA's orbit and beam pattern and are not considering an ap- 
propriate data model for LISA. In addition, the signals from compact binary systems differ 
from simple sinusoids. However, we believe that it demonstrates the applicability of the 
approach to LISA data analysis, and the next step is to deal with these more complicated 
signals and to develop a realistic strategy for applying our MCMC methods. We do not 
claim that this will be a trivial extension; in fact, we acknowledge the complexity of the 
situation. However, we believe that MCMC methods, like those presented here, do give a 
realistic strategy for identifying and characterizing the large number of signals, of all types, 
that will exist in LISA data. 

Besides LISA, the methods discussed here are likely to be useful in other fields of study 
where the data contain an unknown number of periodic signals. One timely area of astro- 
physics is in the field of asteroseismology, where attempts are made to measure vibrational 
modes of a star; see for example Handler et al. . We would expect that our method could 
be directly applied to asteroseismic data on stellar oscillations. Similarly, spectroscopic nu- 
clear magnetic resonance studies are often concerned with spectral decomposition into an 
unknown number of components, dependent on the composition of the sample Q, and we 
believe the parameter estimation technique we present could be usefully applied here too. 
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